skip to main content


Search for: All records

Creators/Authors contains: "Olney, Andrew M"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Moore, S ; Stamper, J ; Cao, T ; Liu, Z ; Hu, X ; Lu, Y ; Liang, J ; Khosravi, H ; Denny, P ; Singh, A (Ed.)
    Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled human evaluation of three conditions: a fine-tuned, augmented version of Macaw, instruction-tuned Bing Chat with zero-shot prompting, and humanauthored questions from a college science textbook. Our results indicate that on six of seven measures tested, both LLM’s performance was not significantly different from human performance. Analysis of LLM errors further suggests that Macaw and Bing Chat have different failure modes for this task: Macaw tends to repeat answer options whereas Bing Chat tends to not include the specified answer in the answer options. For Macaw, removing error items from analysis results in performance on par with humans for all metrics; for Bing Chat, removing error items improves performance but does not reach human-level performance. 
    more » « less
    Free, publicly-accessible full text available September 7, 2024
  2. Fancsali, Stephen E. ; Rus, Vasile (Ed.)

    Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that systematically reduce the complexity of the task. Our results indicate that despite the promise of generalization, Macaw performs poorly on untrained angles. Even on a trained angle, Macaw fails to generate four distinct multiple-choice options on 17% of inputs. We propose augmenting multiple- choice options by paraphrasing angle input and show this increases overall success to 97.5%. A human evaluation comparing the augmented multiple-choice questions with textbook questions on the same topic reveals that Macaw questions broadly score highly but below human questions.

     
    more » « less
  3. Abstract Since its global emergence in 2020, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused multiple epidemics in the United States. When medical treatments for the virus were still emerging and a vaccine was not yet available, state and local governments sought to limit its spread by enacting various social-distancing interventions, such as school closures and lockdowns; however, the effectiveness of these interventions was unknown. We applied an established, semimechanistic Bayesian hierarchical model of these interventions to the spread of SARS-CoV-2 from Europe to the United States, using case fatalities from February 29, 2020, up to April 25, 2020, when some states began reversing their interventions. We estimated the effects of interventions across all states, contrasted the estimated reproduction numbers before and after lockdown for each state, and contrasted the predicted number of future fatalities with the actual number of fatalities as a check of the model’s validity. Overall, school closures and lockdowns were the only interventions modeled that had a reliable impact on the time-varying reproduction number, and lockdown appears to have played a key role in reducing that number to below 1.0. We conclude that reversal of lockdown without implementation of additional, equally effective interventions will enable continued, sustained transmission of SARS-CoV-2 in the United States. 
    more » « less